Middlebury College’s Human Geography with GIS course (GEOG 0261) regularly conducts an analysis on “Flood Hazard Vulnerability in Vermont’s Mobile Homes” using QGIS; the GEOG 0261 analysis builds on Baker et al.’s 2011 study on Rapid Flood Exposure Assessment of Vermont Mobile Home Parks Following Tropical Storm Irene (see bottom of this report for formal citation).
In this report, I conduct a reproduction study of the analysis conducted in GEOG 0261, however I use a code-based approach to spatial analysis using R instead of QGIS. The motivation for this study is to see whether a basic spatial analysis assignment geared towards beginner geography students can be reproduced using a code-based approach. Additionally, I seek to improve internal validity to the study by reducing the impact of a boundary distortion along the Connecticut River. See below for background on the “Flood Hazard Vulnerability in Vermont’s Mobile Homes” assignment that students in GEOG 0261 are assigned:
“Accurate assessment of risk is an essential for effective response to any natural disaster. The methodologies used to assess risk can end up underestimating vulnerabilities. Tropical Storm Irene offers an example of inadequate assessment of risk, which then leads to inadequate planning for and response to a disaster. The storm inundated Vermont with unprecedented rainfall on August 28 and 29 of 2011. The storm destroyed 480 bridges and 960 culverts (where streams cross under a road), causing $350 million in road damage and cutting off road access to 13 mountain communities. Even Vermont’s emergency management offices were flooded! Some of the most affected people were living in mobile homes, whether on individual parcels of land or in mobile home parks. At least 130 mobile homes were destroyed and an additional 300 severely damaged (Figure 1). Our problem will evaluate assessments of flooding risks with a focus on mobile homes in Vermont. There are two different ways of assessing flooding risk in Vermont: one is by the federal agency, FEMA (The Federal Emergency Management Association), and one by a state agency, Vermont Rivers Program. The federal agency, FEMA, estimates flood risk in terms of inundation from rising water levels in stable river channels. Based on existing channels, FEMA hydrologists estimate the region of land that would be potentially flooded by a 1% (100-year) flood. The residents with mortgages in that region are required to purchase flood insurance. The state of Vermont’s River Corridors Program estimates flooding risk differently, using river corridors. After Irene, the state of Vermont recognized that the most damaging flooding in Vermont is not due to inundation but rather due to fluvial erosion: the erosion of riverbanks as the river channel widens or migrates to form new channels (Figure 1 and Figure 2). By this estimation, regions where rivers may erode and migrate to in the future are also at risk of flooding.”
There are five layers for this analysis, each coming from a different primary source. Primary data sources for the study are to include …
river_corridors.shp - polygon - epsg: 32145 Vermont river corridor polygons, as defined by Flood Ready Vermont. This flood hazard approach includes streams (with a 50 foot buffer) and rivers with watersheds more than 2km. The data file can be found on the Vermont Open GeoData Portal http://geodata.vermont.gov/
block_groups.shp - polygon - epsg: 32145 Census block group polygons in southern Vermont, with data on housing. The data file was acquired from the US Census ACS Survey 2014-2018
You can find the full metadata for each of these variables in
the data/metadata section of the repository
I - the author of this reproduction study - have spent the past 3.5 years living as a student in Vermont. I am familiar with the flood risk faced in Vermont, the geography of the state, and how the state makes data publicly accessible. Thus, I have prior experience with the entirety of this study, although this is not a concern given that no statistical tests are conducted and no models are built. The goal of this study is merely to reproduce a study that is usually conducted in QGIS but in R.
I was also a student in GEOG 0261 (formerly GEOG 0120) and I conducted this study in January, 2022 in QGIS.
Going into this study, I know that there is a boundary distortion along the eastern edge of the state along the Connecticut River error that compromises internal validity. Vermont River Corridors (the shapefile) does not include a river corridor model for the Connecticut River. I will attempt to estimate my own river corridor for the Connecticut River.
Also, I know going into this study that there are issues with small numbers of mobile homes in some towns that are used as denominators in calculating percentages, and this will lead to overly sensitive and overly inflated percentages in some towns. I do not plan to change this, as that would require calculating the area of towns and counties, which I did not have time for when completing this study.
Given the research design and primary data to be collected and/or secondary data to be used, discuss common threats to validity and the approach to mitigating those threats, with an emphasis on geographic threats to validity.
Lastly, the original QGIS study utilizes and area weighted reaggration for determining a number of mobile homes at risk in the FEMA flood zones, based on ACS 2014-2018 survey data and assuming an even distribution of mobile homes across counties. However, the GEOG 0261 course has repeatedly demonstrated that this is an inaccurate approach to the research question, and thus I will not try to reproduce this part of the study. This is an issue of a modifiable areal unit problem.
Describe all data transformations planned to prepare data sources for analysis. This section should explain with the fullest detail possible how to transform data from the raw state at the time of acquisition or observation, to the pre-processed derived state ready for the main analysis. Including steps to check and mitigate sources of bias and threats to validity. The method may anticipate contingencies, e.g. tests for normality and alternative decisions to make based on the results of the test. More specifically, all the geographic and variable transformations required to prepare input data as described in the data and variables section above to match the study’s spatio-temporal characteristics as described in the study metadata and study design sections. Visual workflow diagrams may help communicate the methodology in this section.
Examples of geographic transformations include coordinate system transformations, aggregation, disaggregation, spatial interpolation, distance calculations, zonal statistics, etc.
Examples of variable transformations include standardization, normalization, constructed variables, imputation, classification, etc.
Be sure to include any steps planned to exclude observations with missing or outlier data, to group observations by attribute or geographic criteria, or to impute missing data or apply spatial or temporal interpolation.
## tmap mode set to plotting
## Map saved to C:\Users\wprocter\Documents\GitHub\VT-Mobile-Home-Flooding\results\figures\FEMA_flood_zone_map.pdf
## Size: 6.25 by 7.819444 inches
## Map saved to C:\Users\wprocter\Documents\GitHub\VT-Mobile-Home-Flooding\results\figures\river_corridor_map.pdf
## Size: 6.25 by 7.819444 inches
##
## A AE AO
## 567 1832 2
## Simple feature collection with 2401 features and 2 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 424802.5 ymin: 25228.79 xmax: 523718.9 ymax: 158609.9
## Projected CRS: NAD83 / Vermont
## # A tibble: 2,401 × 3
## FLD_ZONE geometry flood
## * <chr> <MULTIPOLYGON [m]> <lgl>
## 1 AE (((446276.7 124090.2, 446286.8 124096.6, 446295.7 124103.9, 4… TRUE
## 2 AE (((446494.3 124089.5, 446493.5 124098, 446491.4 124108.5, 446… TRUE
## 3 AE (((444273.9 123609.2, 444286.7 123605.9, 444296.5 123602.8, 4… TRUE
## 4 AE (((442981.8 122314.2, 442982.7 122310.9, 442986.9 122303.6, 4… TRUE
## 5 AE (((448164.3 123952.9, 448168.3 123952.4, 448171.6 123951.9, 4… TRUE
## 6 AE (((444449.3 123728.5, 444452.7 123721.8, 444458.8 123713.4, 4… TRUE
## 7 A (((453734.8 130105.7, 453734.8 130137.5, 453734.8 130172.4, 4… TRUE
## 8 AE (((443409.4 123216.8, 443410.1 123219.9, 443412.3 123223.9, 4… TRUE
## 9 AE (((448766.4 123935.4, 448768.5 123934.6, 448772.2 123933.4, 4… TRUE
## 10 AE (((448860.4 123953.6, 448859.9 123957.1, 448854 123964.4, 448… TRUE
## # ℹ 2,391 more rows
## # A tibble: 4 × 2
## county number_of_MHs
## * <chr> <dbl>
## 1 Bennington 1277
## 2 Rutland 1992
## 3 Windham 1833
## 4 Windsor 2427
I exclude column 2, which was created using area weighted aggregation of mobile homes.However, the GEOG 0261 instructors discovered that this approach is less accurate than using the e911 point data to identify mobile homes at risk, so I will forego reproducing the AWR approach.
NOTE: I attempted also to sum the number of e911 mobile home points
for each county. This yielded a different number of mobile homes than
Column 1 indicates. Because the ACS measurement of mobileHU
is a survey-based estimate, it does not represent the true number of
mobile home structures. This is a source of geographic uncertainty to
this analysis, specifically an issue of spatial heterogeneity and
construct validity.
I will proceed with using Total Number of Mobile Homes column from the ACS data to be consistent with the GEOG 0261 analysis, but this is something that may want to be changed in the future.
## Warning: attribute variables are assumed to be spatially constant throughout
## all geometries
## # A tibble: 4 × 2
## county mobile_home_count
## * <chr> <int>
## 1 Bennington 189
## 2 Rutland 164
## 3 Windham 299
## 4 Windsor 198
## # A tibble: 4 × 2
## county mobile_home_count
## * <chr> <int>
## 1 Bennington 130
## 2 Rutland 204
## 3 Windham 298
## 4 Windsor 353
## # A tibble: 4 × 6
## county number_of_MHs MHs_at_risk_FEMA MHs_at_risk_River_Co…¹ FEMA_rate RC_rate
## <chr> <dbl> <int> <int> <dbl> <dbl>
## 1 Benni… 1277 189 130 0.148 0.102
## 2 Rutla… 1992 164 204 0.0823 0.102
## 3 Windh… 1833 299 298 0.163 0.163
## 4 Winds… 2427 198 353 0.0816 0.145
## # ℹ abbreviated name: ¹MHs_at_risk_River_Corridors
Unplanned deviation for reproduction: I decided to calculate “rates”risk rates” to indicate what proportion of a county’s mobile homes lie within the FEMA flood zones and the River Corridors, respectively. They are pretty similar for Windham County. In Bennington County, the FEMA risk rate is higher than the River Corridor risk rate. In Windsor County and Rutland County, the River Corridor risk rate is higher than the FEMA risk rate.
This will cast a wider net than if we look at just flood zones or river corridors individually, as this will maximize the number of mobile homes that are determined to be at risk. At this stage in the analysis, we care more about seeing which towns have the highest vulnerability of mobile homes to flooding, not whether the VT River Corridor or FEMA Flood Zone approach is more accurate. Thus, including both flood risk identification metrics is a safer approach to ensure we identify all mobile homes that are at some level of risk to flooding.
## Simple feature collection with 10 features and 4 fields
## Geometry type: POLYGON
## Dimension: XY
## Bounding box: xmin: 437815 ymin: 36309.08 xmax: 509676.9 ymax: 148890.4
## Projected CRS: NAD83 / Vermont
## # A tibble: 10 × 5
## town mobile_home_count geometry at_risk_count
## <chr> <int> <POLYGON [m]> <dbl>
## 1 Woodford 27 ((446813.5 41030.5, 446912.2 4277… 23
## 2 Woodstock 70 ((488855 117635, 493730 128803.1,… 48
## 3 Sandgate 7 ((438054.6 74666.82, 438157.2 768… 4
## 4 Jamaica 100 ((469989.1 68179.99, 470183.9 717… 49
## 5 Windsor 63 ((504000.3 113907.5, 507400.5 113… 30
## 6 Killington 13 ((470120.8 124335.6, 471672.5 128… 6
## 7 Pittsfield 9 ((469974.3 148890.4, 472846.2 147… 4
## 8 Plymouth 18 ((476628.5 111137.1, 477790.3 119… 8
## 9 Wilmington 94 ((465280.3 39651.14, 465348 41403… 38
## 10 Proctor 15 ((454734.2 131983.4, 455376.5 132… 6
## # ℹ 1 more variable: pct_mh_at_risk <dbl>
Results slightly differ between this ant the GEOG 0261 results because I directly downloaded the towns.shp file from the VT Open GeoData Portal, while the GEOG 0261 class uses a pre-cleaned provided towns shapefile layer. I could not use the provided one due to file corruption issues. However, the provided one for the class distinguishes between Rutland Town and Rutland City, while the layer that I downloaded treated the two as a combined town of “Rutland.” Rutland Town has a pct_mh_at_risk value of 57.89, so if my towns file distinguished just the town portion, it would be in the top 10 of highest risk towns for mobile homes at risk.
## tmap mode set to plotting
## Map saved to C:\Users\wprocter\Documents\GitHub\VT-Mobile-Home-Flooding\results\figures\pct_mh_at_risk_by_town.pdf
## Size: 9.125 by 5.361111 inches
I was curious about the discrepancy between the two metrics so created a layer that differences the two polygon layers and plots it with a satellite base-map. Notably, the River Corridors include fewer lakes/ponds and large rivers. This is significant, because these water bodies can still cause severe flooding if water influx causes them to over spill their banks. Additionally, note that the Connecticut River is not included at all in the River Corridors.
## Warning: attribute variables are assumed to be spatially constant throughout
## all geometries
## tmap mode set to interactive viewing